Coupling Fault Diagnosis Based on Dynamic Vertex Interpretable Graph Neural Network

Mechanical equipment is composed of several parts, and the interaction between parts exists throughout the whole life cycle, leading to the widespread phenomenon of fault coupling. The diagnosis of independent faults cannot meet the requirements of the health management of mechanical equipment under actual working conditions. In this paper, the dynamic vertex interpretable graph neural network (DIGNN) is proposed to solve the problem of coupling fault diagnosis, in which dynamic vertices are defined in the data topology. First, in the date preprocessing phase, wavelet transform is utilized to make input features interpretable and reduce the uncertainty of model training. In the fault topology, edge connections are made between nodes according to the fault coupling information, and edge connections are established between dynamic nodes and all other nodes. Second the data topology with dynamic vertices is used in the training phase and in the testing phase, the time series data are only fed into dynamic vertices for classification and analysis, which makes it possible to realize coupling fault diagnosis in an industrial production environment. The features extracted in different layers of DIGNN interpret how the model works. The method proposed in this paper can realize the accurate diagnosis of independent faults in the dataset with an accuracy of 100%, and can effectively judge the coupling mode of coupling faults with a comprehensive accuracy of 88.3%.


Introduction
A mechanical structure is affected by temperature, vibration, humidity, electromagnetic, shock, and other external stresses during operation.The external stresses act on a large number of components at the same time and affect their reliability synchronously [1].Meanwhile, during the operation of the equipment, there are internal stresses such as friction and collision between the equipment [2], and their reliability is influenced by each other.Under the same working conditions, there are widespread cases of multi-fault coupling derivation in mechanical equipment [3,4]; furthermore, the coupling of faults may aggravate the derivation speed of faults.Therefore, the accurate identification of coupling faults can diagnose and warn of the occurrence of faults in advance, providing an important reference for maintenance decisions, which has an important application value [5].
Recently, scholars have made important contributions to the research on coupling the fault diagnosis of rotating machinery.Some scholars have studied the frequency domain characteristics of vibration signals and classified multiple faults based on the frequency domain characteristics.In coupling fault diagnosis, fault classification is usually carried out by decoupling, and bearing fault features are extracted.Mishra et al. collected bearing vibration signals under uncertain speed conditions and proposed the STFNet network to extract vibration features and classify faults at different rotating speeds [6].

Tao et al. proposed a cross-domain diagnosis method based on time-frequency domain
Sensors 2024, 24, 4356 2 of 16 information fusion, which uses the joint distribution distance to cluster fault features in the time-frequency domain [7].Zhang et al. selected three indices insensitive to defect position as feature vectors for bearing fault diagnosis, and used the KNN algorithm for fault classification [8].Dezun et al. proposed iterative generalized demodulation (IGD) to extract the feature frequency of instantaneous faults and carried out feature extraction and fault diagnosis for bearing faults with multiple faults and variable rotating speed [9].Neisi et al. used UKF to estimate the vibration characteristics of rotating machinery to achieve data fitting and fault identification under multi-fault conditions [10].Wang et al. selected the data channel by the Pearson correlation coefficient combined with the GRU method for multiple fault diagnosis, and time judged the severity of the fault [11].
Some other scholars have used the neural network data-driven method to classify the multiple faults of rotating machinery.By using the self-consistency of the convergence model [12], the neural network arranges the chaotic model parameters through gradient descent to make them show functional consistency on a macro level, and is widely used in the research of the fault diagnosis of mechanical equipment.However, the black box attribute of the neural network brings difficulties to the interpretation of the model.Although the neural network has a strong performance, it reduces the reliability of the model, which has a negative impact on the engineering application of the neural network to some extent.In the study of the fault diagnosis of rotating machinery, Zhang et al. diagnosed bearing multiple faults by improving the CNN network pool layer and fully connected layer structure [13].Gong et al. used the finite element simulation method to expand the non-equilibrium data of bearing multiple faults, and the transfer learning method was used to diagnose the faults [14].Deng et al. proposed the MgNet model to diagnose multi-bearing faults at the system level by collecting vibration signals of the auxiliary bearings [15].Shi et al. proposed the WDCNN method to classify bearing faults by introducing the multi-scale large convolutional kernel and GRU network, and divided the bearing life cycle into the health stage and fault stage [16].Hadi et al. determined the structure of the fault diagnosis neural network through reinforcement learning and other methods, which further veiled the explainability of the model [17].
On the cutting edge of machine learning, the generative large model is widely used to manufacture and output knowledge, but its reliability is difficult in terms of meeting the high accuracy of fault diagnosis and other engineering fields [18].In order to improve the reliability of the model, scholars have carried out research on the interpretability of the model.In the study of interpretable neural networks, Zhang et al. [19] classified interpretable models according to three dimensions: type of engagement (passive and active interpretation methods), the type of explanation, and the focus (from local to global explainability).Li et al. used the U-Net network and integrated signal segmentation technology to mine interpretable fault characteristic information to realize fault diagnosis [20].Che et al. used the multi-domain element transfer learning framework based on deep convolutional neural networks to reverse reconstruct the fault classification results through the fusion of hierarchical class activation mapping to obtain the mapping relationship between the input and output, which is a global ante hoc interpretable model with feature decomposition [21].The DIGNN model proposed in this paper is a global ante hoc interpretable model with rule extraction.
The graph neural network was proposed in 2009 [22].Due to its unique network topology, it has the ability to represent graph data and analyze related data.Currently, it is mainly used in natural language processing [23,24], traffic flow analysis [25,26], molecular structure modeling [27,28], and other fields.According to its structure, it can be divided into two types: spatial-based and spectral.The spectral-based method uses the graph convolution filter to denoise the node data and extract the main features of the signal.Michael et al. [29] extended CNN to the field of graph signal processing, and established convolution check data based on spectrum to carry out convolution operations, reducing its operational complexity.Defferrard et al. [30] improved the efficiency of graph coarsedness when CNN was generalized to GNN by establishing the operation of balanced binary tree Sensors 2024, 24, 4356 3 of 16 record graph coarsedness.The spatial based method is used to aggregate information of the node neighborhood.Gama et al. [31] established the input time series by means of graph aggregation to implement CNN transplantation for unstructured graph data.
Li et al. [32] first applied the graph neural network to fault diagnosis research and established relevant research benchmarks.This method converted time series single-channel data into graph structure to respectively realize fault diagnosis of rotating machinery.However, the topology structure of the graph data constructed by this method lacks physical meaning and has poor interpretability.Gao et al. used GNN to conduct one-shot model training and diagnosis for bearing fault types under varying working conditions [33].Man et al. [34] took the data collected by multiple sensors as the nodes in the figure respectively, and used GAT to diagnose the fault of the train steering gear.[35].The methods mentioned above fix the topology structure of graph data, which is difficult to adapt to the uncertain topology structure caused by the unknown fault type input of real-time data in engineering applications.The methods mentioned run in black box mode, but they still fully embody the excellent performance of the graph neural network in the field of fault diagnosis.In this paper, an interpretable DIGNN method was proposed based on the graph neural network, which can effectively utilize the characteristics of node correlation to realize the coupling fault diagnosis of rotating machinery.The main contributions of this paper are as follows: (a) A topology construction method of dynamic vertex data for graph neural networks is proposed, which is suitable for topology-based correlation analysis.(b) An explainable coupling fault diagnosis method is proposed, which gives physical meaning to the data-driven method based on the graph neural network.(c) In this paper, the bearing coupling fault was analyzed as an example, and the test results show that the method can realize coupling component analysis on the basis of coupling fault diagnosis.
The rest of this paper is organized as follows.Section 2 describes an interpretable DIGNN method based on the graph neural network with the characteristic of node correlation for coupling fault diagnosis.In Section 3, the overall process of the DIGNN algorithm is described, and the pre-processing scheme of wavelet transform and the network structure of DIGNN are specifically described, and the interpretability of the model is analyzed.In Section 4, the dataset used in this paper is first introduced in Section 4.1, then the data are pre-processed using wavelet transform, and the graph data containing the fault topology are constructed in Section 4.2.Then, the method proposed in this paper is used to diagnose the rotating mechanical coupling fault compared with other benchmark methods in Section 4.3.The classification of model coupling faults is explained and analyzed.Finally, conclusions are drawn in Section 5.

Interpretability of Graph Neural Networks
In the fault diagnosis of independent fault modes, only the feature extraction of the single fault mode needs to be considered.While under the condition of multiple fault coupling, there are correlation features between coupled fault modes, and the traditional CNN network cannot extract the implicit correlation between multiple faults.Graph neural networks are able to propagate relevant information between nodes through edges, thus providing interpretability to coupled fault diagnosis.Some notations used in this paper are shown in Table 1: In graph G = (V, ε), V represents the vertices in the graph; ε is the edge connecting the vertices; ε = e ij x i , x j ∈ V , ε ≤ N 2 .Let A represent the symmetric adjacency matrix, D denote the degree matrix, and X ∈ R N×d be the input data.N and d are the number of nodes and the feature dimension, respectively.L = D − A is a symmetric matrix Laplacian matrix, and the Fourier basis U is obtained by the spectral decomposition of L.
The matrix obtained by spectral decomposition is orthogonal, UU T = E, L = UΛU T , the Fourier transform of x is x = U T x, and its inverse transformation is x = U x, the continuous orthogonal basis on the graph is converted to the basis of the Fourier transform for graph convolution operation: where g θ = U T g = g θ (Λ) denotes the diagonal matrix and ⊙ is the Hadamard product.The Chebyshev polynomial into the inner product of the vector.Then, the ChebyNet expression is obtained, and the nodes in the K − hop neighborhood are aggregated as follows: where L = 2/λ max (L − I N ) is the standardized Laplacian eigenvalue and λ max is the largest eigenvalue.Take K = 2 to simplify ChebyNet further: Sensors 2024, 24, 4356 , so the GCN interlayer formula [36] is obtained as follows: where σ is the nonlinear mapping function, W l is the weight matrix of the lth layer, and X l is the input variable of the lth layer (X 0 = X).A = A + I N is the adjacency matrix of adding node self-information, and the aggregation of node information in the node-hop neighborhood is realized by symmetric normalization of the adjacency matrix D − 1 2 AD − 1 2 .Through k layers of the GCN network, the information of vertices within k − hop are aggregated.The symmetric standardization defines the direction of information transmission between nodes, which not only avoids the difference in data measurement scale caused by the difference in node degree, but also considers the amount of information of the two nodes connected by the edge [37].
The traditional GCN algorithm uses a data-driven approach to build data topology, which can be modified when information is transferred between layers, as shown in Figure 1.
( ) Defining , so the GCN interlayer formula [36] is obtained as follows: where  is the nonlinear mapping function, l W is the weight matrix of the th l layer, and l X is the input variable of the th l layer ( is the adjacency matrix of adding node self-information, and the aggregation of node information in the node-hop neighborhood is realized by symmetric normalization of the adjacency matrix are aggregated.The symmetric standardization defines the direction of information transmission between nodes, which not only avoids the difference in data measurement scale caused by the difference in node degree, but also considers the amount of information of the two nodes connected by the edge [37].
The traditional GCN algorithm uses a data-driven approach to build data topology, which can be modified when information is transferred between layers, as shown in Figure 1.

Data Preprocessing
In this paper, the coupling fault was indicated by a node in the graph, and the coupling topology was constructed by connecting the coupling fault node with a single fault node.In order to provide the ability of fault diagnosis analysis and diagnosis of new data, this paper took the fault to be classified as a node in the graph, which is called a dynamic vertex, and constructed the edge vector between the node and the other nodes.In actual condition, bearings and gear structures often work at the same time, and their vibration signals affect each other, as illustrated in Figure 2. Comparing the bearing data doped with gear signals in the XJTU Gearbox dataset with the pure bearing vibration signals in the XJTU-SY Bearing datasets, the rotation of gears will generate high-frequency signals, which will affect the simplicity of the bearing characteristics.Therefore, this paper also took the gear faults as nodes to provide negative samples for the fault diagrams and reduce the impact of gear faults on the fault diagnosis results [38].

Data Preprocessing
In this paper, the coupling fault was indicated by a node in the graph, and the coupling topology was constructed by connecting the coupling fault node with a single fault node.In order to provide the ability of fault diagnosis analysis and diagnosis of new data, this paper took the fault to be classified as a node in the graph, which is called a dynamic vertex, and constructed the edge vector between the node and the other nodes.In actual condition, bearings and gear structures often work at the same time, and their vibration signals affect each other, as illustrated in Figure 2. Comparing the bearing data doped with gear signals in the XJTU Gearbox dataset with the pure bearing vibration signals in the XJTU-SY Bearing datasets, the rotation of gears will generate high-frequency signals, which will affect the simplicity of the bearing characteristics.Therefore, this paper also took the gear faults as nodes to provide negative samples for the fault diagrams and reduce the impact of gear faults on the fault diagnosis results [38].For rotating machinery, the vibration signal characteristics are mainly determined by the vibration direction, rotation frequency, and vibration amplitude.In order to improve the interpretability of the method and reduce the uncertainty of the data-driven training of graph neural networks, this paper adopted wavelet transform to preprocess the data [39].Morlet wavelet basis was used to transform the data, and the bandpass filter ensured that it had good time-frequency domain localization characteristics and could accurately locate the frequency range.The expression is as follows: By expanding and translating at different scales a and amounts of displacement b , the wavelet family is obtained: where , in which f ψ is the center frequency of the wavelet and totalscale is the number of scales.After, the input signal x is transformed into a two-dimensional signal ( t , i a , value ), in which . The formula of continuous wavelet transform is: The data topology is constructed according to the prior knowledge, in which the fixed vertices are the standard features of the wavelet transform of each type of fault, the dynamic vertices are the data to be measured, and the graph data X is formed together.

Coupling Fault Diagnosis
The data were divided into the training set and the test set, and the partition ratio was 8:2.In this paper, the cross-entropy error was used as the error calculation method; the label _ y lable∈  was converted into one-hot code  For rotating machinery, the vibration signal characteristics are mainly determined by the vibration direction, rotation frequency, and vibration amplitude.In order to improve the interpretability of the method and reduce the uncertainty of the data-driven training of graph neural networks, this paper adopted wavelet transform to preprocess the data [39].Morlet wavelet basis was used to transform the data, and the bandpass filter ensured that it had good time-frequency domain localization characteristics and could accurately locate the frequency range.The expression is as follows: By expanding and translating at different scales a and amounts of displacement b, the wavelet family is obtained: where is the center frequency of the wavelet ψ(t) and totalscale is the number of scales.After, the input signal x is transformed into a two-dimensional signal (t, a i , value), in which value = abs(WT(a, b)).The formula of continuous wavelet transform is: The data topology is constructed according to the prior knowledge, in which the fixed vertices are the standard features of the wavelet transform of each type of fault, the dynamic vertices are the data to be measured, and the graph data X is formed together.

Coupling Fault Diagnosis
The data were divided into the training set and the test set, and the partition ratio was 8:2.In this paper, the cross-entropy error was used as the error calculation method; the label y_lable ∈ R was converted into one-hot code y ∈ R N and used to calculate the cross-entropy with the model output ŷ ∈ R N : The loss function consists of two parts, Loss1 and Loss2, where Loss1 is the classification error of the fault type to be diagnosed and Loss2 is the classification error of the fault type of other nodes.The total loss function is as follows.
Since the model focuses on the fault classification of unknown faults, we set α < 1 so that the accuracy of node classification was ensured first in this paper.

Algorithm Flow
The algorithm flow of the algorithm proposed in this paper is shown in Figure 3.
(a) The input data X 0 = X = (x 1 , x 2 , . . ., x N ), X 0 ∈ R N×d are composed of node features x n ∈ R d of N types of faults, where d is the feature dimension.The Nth node is the dynamic vertex (i.e., the fault node to be diagnosed).(b) After X 0 is transformed by wavelet, X 1 = Wavelet(X 0 ), X 1 ∈ R N× f ×S where f is the number of the frequency spectrum and S is the size of the wavelet scale.The wavelet transform raises the dimension of one-dimensional data, endows the data with more intuitive features, and at the same time, carries out data preprocessing, which is conducive to the subsequent feature extraction of the neural network.(c) After the signal is convolved on the two-dimensional spectrum data, the number of convolution nuclei is f , and X 2 ∈ R N× f is obtained.Through model training, it extracts the feature assignment in X 2 at each frequency and carries out standardization processing.In the process of CNN, only independent convolution processing is carried out for each node, that is, no data information from other nodes is involved.(d) Through the two-layer GCN network, it can be obtained that The feature extraction of highfrequency and low-frequency features is further carried out in the way of dichotomy, which is similar to the wavelet packet decomposition technology [40].At the same time, the specific features of each type of fault are extracted through model training.(e) Through the fully connected MLP, there is further dimensionality reduction of the data, X 5 = MLP(X 4 ), The loss function consists of two parts, 1 Loss and 2 Loss , where 1 Loss is the classification error of the fault type to be diagnosed and 2 Loss is the classification error of the fault type of other nodes.The total loss function is as follows.  so that the accuracy of node classification was ensured first in this paper.

Algorithm Flow
The algorithm flow of the algorithm proposed in this paper is shown in Figure 3.
(a) The input data ( , , , ) N types of faults, where d is the feature dimension.The th N node is the dynamic vertex (i.e., the fault node to be diagnosed).
where f is the number of the frequency spectrum and S is the size of the wavelet scale.The wavelet transform raises the dimension of one-dimensional data, endows the data with more intuitive features, and at the same time, carries out data preprocessing, which is conducive to the subsequent feature extraction of the neural network.(c) After the signal is convolved on the two-dimensional spectrum data, the number of convolution nuclei is f , and extracts the feature assignment in 2 X at each frequency and carries out standardization processing.In the process of CNN, only independent convolution processing is carried out for each node, that is, no data information from other nodes is involved.(d) Through the two-layer GCN network, it can be obtained that , where . The feature extraction of highfrequency and low-frequency features is further carried out in the way of dichotomy, which is similar to the wavelet packet decomposition technology [40].At the same time, the specific features of each type of fault are extracted through model training.
(e) Through the fully connected MLP, there is further dimensionality reduction of the data, 54  ()

Interpretability Analysis
Explainability is defined as the ability to explain to people in plain language.Graph neural network itself has distinct physical significance because of the topology of its nodes.The interpretability of the structure is made clear by assigning the input data with relation to the node.The DIGNN model proposed in this paper explores and optimizes the interpretability of the model in the specific application of coupling fault diagnosis, so this model is an interpretable model [41].
(a) Each node in G is a type of fault data, and the topological structure maintains the input structure from beginning to end.Each node has a clear physical meaning, and each node in the output data Ŷ corresponds to the classification of various types of fault data.(b) The vibration signal is converted into the time-frequency domain signal by wavelet transform, which provides the data with a clear physical meaning.Due to the introduction of data topology, the physical meaning of each vertex data X 1 , X 2 , X 3 , X 4 , X 5 remains stable, even if the data dimension is changed under the condition that the GCN network structure remains unchanged, and all of them are linear transformations of the vibration amplitude of this type of fault at a specific frequency.(c) The similar characteristics of similar faults in coupling faults are enhanced by aggregation operation: where N {x i } = x j e ij ∈ ε , and the fault characteristic information of related nodes is aggregated through the target vertex of GCN.The nodes at both ends are comprehensively considered by the symmetric standardization operation D − 1 2 AD − 1 2 .The degree of coupling fault is relatively large because it is related to multiple fault nodes, and the symmetric standardization can reduce the weight of such nodes in the aggregation, effectively avoiding the influence of unrelated fault type data on the aggregation of adjacent nodes.

Dataset Introduction
In this paper, the data in the XJTU Gearbox dataset were used.Four planetary gear failure modes and four bearing failure modes were injected in the experiment.As shown in Figure 4, injected gear failures included tooth wear, missing tooth, root cracks, and broken tooth.The injected bearing faults included ball bearing faults, inner race faults, outer race faults, and coupling faults of the above three bearing faults.Together with the normal state, a total of nine states of the type vibration signals were collected.The fault relationship topology is shown in Figure 5.
nodes.The interpretability of the structure is made clear by assigning the input data with relation to the node.The DIGNN model proposed in this paper explores and optimizes the interpretability of the model in the specific application of coupling fault diagnosis, so this model is an interpretable model [41].
(a) Each node in  is a type of fault data, and the topological structure maintains the input structure from beginning to end.Each node has a clear physical meaning, and each node in the output data Ŷ corresponds to the classification of various types of fault data.(b) The vibration signal is converted into the time-frequency domain signal by wavelet transform, which provides the data with a clear physical meaning.Due to the introduction of data topology, the physical meaning of each vertex data 1 2 3 4 5 , , , , X X X X X remains stable, even if the data dimension is changed under the condition that the GCN network structure remains unchanged, and all of them are linear transformations of the vibration amplitude of this type of fault at a specific frequency.(c) The similar characteristics of similar faults in coupling faults are enhanced by aggregation operation: where , and the fault characteristic information of related nodes is aggregated through the target vertex of GCN.The nodes at both ends are comprehensively considered by the symmetric standardization operation The degree of coupling fault is relatively large because it is related to multiple fault nodes, and the symmetric standardization can reduce the weight of such nodes in the aggregation, effectively avoiding the influence of unrelated fault type data on the aggregation of adjacent nodes.

Dataset Introduction
In this paper, the data in the XJTU Gearbox dataset were used.Four planetary gear failure modes and four bearing failure modes were injected in the experiment.As shown in Figure 4, injected gear failures included tooth wear, missing tooth, root cracks, and broken tooth.The injected bearing faults included ball bearing faults, inner race faults, outer race faults, and coupling faults of the above three bearing faults.Together with the normal state, a total of nine states of the type vibration signals were collected.The fault relationship topology is shown in Figure 5.In the experiment, transverse and longitudinal vibration sensors were installed to collect the state data.In this paper, radial and vertical vibration signals were selected for processing, and the sampling frequency was 20.48 kHz

Data Preprocessing
The data were preprocessed by wavelet transform before input into the neural network.As shown in Figure 6, the signals in the time-frequency domain of the bearings and gears were mainly concentrated within 8000 Hz, so the time-frequency domain signals in the frequency band from 0 to 8000 Hz were selected in this paper, and the time-frequency domain signals of various faults are shown in Figure 6.In the experiment, transverse and longitudinal vibration sensors were installed to collect the state data.In this paper, radial and vertical vibration signals were selected for processing, and the sampling frequency was 20.48 kHz

Data Preprocessing
The data were preprocessed by wavelet transform before input into the neural network.As shown in Figure 6, the signals in the time-frequency domain of the bearings and gears were mainly concentrated within 8000 Hz, so the time-frequency domain signals in the frequency band from 0 to 8000 Hz were selected in this paper, and the time-frequency domain signals of various faults are shown in Figure 6.In the experiment, transverse and longitudinal vibration sensors were installed to collect the state data.In this paper, radial and vertical vibration signals were selected for processing, and the sampling frequency was 20.48 kHz

Data Preprocessing
The data were preprocessed by wavelet transform before input into the neural network.As shown in Figure 6    It can be seen from Figure 6 that fault coupling is not a simple feature superposition, but contains a complex physical mechanism.In this paper, the data-driven graph neural network method was used to simulate the physical mechanism and achieve multi-fault coupling analysis.
After analyzing the fault types, the fault data topology was established according to the coupling relationship between faults.The graph structure established in this paper was an undirected graph containing self-loops, as shown in Figure 7.The data topology was composed of 10 nodes  In the training set data topology, the data X except the dynamic vertex 10 x was randomly sampled in the data sample corresponding to the fault type.In order to reduce the difficulty of test set creation and application, this paper took the average value of all kinds of known fault data after wavelet transform as the fixed vertices of the test set: 1 2 9 { , , .} x x x  .After wavelet transform, the data of each fault type were clearly divided in the data space, as shown in Figure 8, which helped to reduce the influence of uncertainty brought by the graph neural network.In the fault space, the distance between the coupling fault and its fault It can be seen from Figure 6 that fault coupling is not a simple feature superposition, but contains a complex physical mechanism.In this paper, the data-driven graph neural network method was used to simulate the physical mechanism and achieve multi-fault coupling analysis.
After analyzing the fault types, the fault data topology was established according to the coupling relationship between faults.The graph structure established in this paper was an undirected graph containing self-loops, as shown in Figure 7.The data topology was composed of 10 nodes V = {x 1 , x 2 , . . . ,x 10 }.Fault data were randomly selected in the dataset and filled with flexible nodes to form a training set, corresponding to the orange node in the figure, x 4 is the coupled fault data, and the other nodes are independent fault types.The data topology of the graph consisted of 34 edges e i,j ∈ ε, and the dynamic vertex was assumed to be related to all vertices because of its type and the unknown correlation with other vertices (i.e., e 1,10 , e 9,10 , . . ., e 10,10 = 1).

(g) Normal
(h) Root cracks (i) Tooth wear It can be seen from Figure 6 that fault coupling is not a simple feature superposition, but contains a complex physical mechanism.In this paper, the data-driven graph neural network method was used to simulate the physical mechanism and achieve multi-fault coupling analysis.
After analyzing the fault types, the fault data topology was established according to the coupling relationship between faults.The graph structure established in this paper was an undirected graph containing self-loops, as shown in Figure 7.The data topology was composed of 10 nodes  In the training set data topology, the data X except the dynamic vertex 10 x was randomly sampled in the data sample corresponding to the fault type.In order to reduce the difficulty of test set creation and application, this paper took the average value of all kinds of known fault data after wavelet transform as the fixed vertices of the test set: After wavelet transform, the data of each fault type were clearly divided in the data space, as shown in Figure 8, which helped to reduce the influence of uncertainty brought by the graph neural network.In the fault space, the distance between the coupling fault and its fault In the training set data topology, the data X except the dynamic vertex x 10 was randomly sampled in the data sample corresponding to the fault type.In order to reduce the difficulty of test set creation and application, this paper took the average value of all kinds of known fault data after wavelet transform as the fixed vertices of the test set: {x 1 , x 2 , . . . ,x 9 }.
After wavelet transform, the data of each fault type were clearly divided in the data space, as shown in Figure 8, which helped to reduce the influence of uncertainty brought by the graph neural network.In the fault space, the distance between the coupling fault and its fault type was relatively close, reflecting the coupling effect between faults.The obvious boundary between the bearing and gear faults was obtained only by wavelet transform.
type was relatively close, reflecting the coupling effect between faults.The obvious boundary between the bearing and gear faults was obtained only by wavelet transform.

Coupling Fault Diagnosis
First, the model proposed in this paper was used for coupling fault diagnosis of the data.The training set contained 540 samples, the validation set contained 180 samples, and the test set contained 180 samples.In each sample, which can be called a graph cluster, 10 x was selected randomly from nine fault types.The model adopted the SGD optimization method, the learning rate was 0.01, and the momentum was defined as 0.9.A convolutional neural network with a 100 × 1024 × 1024 filter was used in series with two layers of GCN networks.The filter size in the first layer of GCN was 1024 × 512 and the second GCN layer contained a 512 × 256 filter.
Batch normalization further standardized the time-domain information processing for each layer of the network output [42].While improving the operational efficiency and generalization of the neural network, by adding the BN layer after two GCN networks, the network accuracy rate increased from 87.78% to 100%, each step of the training time decreased from 87 s to 9.17 s, and the algorithm convergence speed had been greatly improved as shown in Figure 9.Each step was to perform serial operations on seven graph data, and each graph was optimized for gradient descent after classification.

Coupling Fault Diagnosis
First, the model proposed in this paper was used for coupling fault diagnosis of the data.The training set contained 540 samples, the validation set contained 180 samples, and the test set contained 180 samples.In each sample, which can be called a graph cluster, x 10 was selected randomly from nine fault types.The model adopted the SGD optimization method, the learning rate was 0.01, and the momentum was defined as 0.9.A convolutional neural network with a 100 × 1024 × 1024 filter was used in series with two layers of GCN networks.The filter size in the first layer of GCN was 1024 × 512 and the second GCN layer contained a 512 × 256 filter.Batch normalization further standardized the time-domain information processing for each layer of the network output [42].While improving the operational efficiency and generalization of the neural network, by adding the BN layer after two GCN networks, the network accuracy rate increased from 87.78% to 100%, each step of the training time decreased from 87 s to 9.17 s, and the algorithm convergence speed had been greatly improved as shown in Figure 9.Each step was to perform serial operations on seven graph data, and each graph was optimized for gradient descent after classification.
type was relatively close, reflecting the coupling effect between faults.The obvious boundary between the bearing and gear faults was obtained only by wavelet transform.

Coupling Fault Diagnosis
First, the model proposed in this paper was used for coupling fault diagnosis of the data.The training set contained 540 samples, the validation set contained 180 samples, and the test set contained 180 samples.In each sample, which can be called a graph cluster, 10 x was selected randomly from nine fault types.The model adopted the SGD optimization method, the learning rate was 0.01, and the momentum was defined as 0.9.A convolutional neural network with a 100 × 1024 × 1024 filter was used in series with two layers of GCN networks.The filter size in the first layer of GCN was 1024 × 512 and the second GCN layer contained a 512 × 256 filter.
Batch normalization further standardized the time-domain information processing for each layer of the network output [42].While improving the operational efficiency and generalization of the neural network, by adding the BN layer after two GCN networks, the network accuracy rate increased from 87.78% to 100%, each step of the training time decreased from 87 s to 9.17 s, and the algorithm convergence speed had been greatly improved as shown in Figure 9.Each step was to perform serial operations on seven graph data, and each graph was optimized for gradient descent after classification.Several models were used to diagnose coupling faults including ChebyNet, GCN, DVChebyNet, DVGraphSAGE, DVHOGNN, and the DIGNN proposed in the article.Among the models, except for ChebyNet and GCN, the models used a wavelet transform operation and consisted of a 100 × 1024 × 1024 CNN layer to convert the time-frequency domain data matrix into a one-dimension vector.These models used different graph neural network layers, and their network structure hyperparameters remained consistent.The accuracy of model diagnosis is shown in Figure 10.Due to the convolution operation on the input signal in the frequency domain, the ChebyNet had a faster convergence rate for the original data without wavelet transform.However, compared with the pre-processed model after wavelet transform, there was a large gap in accuracy.The GCN algorithm is a spatial domain graph neural network algorithm that has strong model fitting ability, but the time series data need to be iterated many times to discover the information contained in it.The graph neural network with wavelet transform can optimize the model input, avoid the uncertainty caused by pure data drive, and integrate its powerful model fitting ability to improve the accuracy of fault diagnosis.The DIGNN algorithm proposed in this paper had the highest accuracy in the validation set and showed great robustness in the test set.Additionally, it converged fastest among the benchmark models, as shown in Table 2.
Several models were used to diagnose coupling faults including ChebyNet, GCN, DVChebyNet, DVGraphSAGE, DVHOGNN, and the DIGNN proposed in the article.Among the models, except for ChebyNet and GCN, the models used a wavelet transform operation and consisted of a 100 × 1024 × 1024 CNN layer to convert the time-frequency domain data matrix into a one-dimension vector.These models used different graph neural network layers, and their network structure hyperparameters remained consistent.The accuracy of model diagnosis is shown in Figure 10.Due to the convolution operation on the input signal in the frequency domain, the ChebyNet had a faster convergence rate for the original data without wavelet transform.However, compared with the pre-processed model after wavelet transform, there was a large gap in accuracy.The GCN algorithm is a spatial domain graph neural network algorithm that has strong model fitting ability, but the time series data need to be iterated many times to discover the information contained in it.The graph neural network with wavelet transform can optimize the model input, avoid the uncertainty caused by pure data drive, and integrate its powerful model fitting ability to improve the accuracy of fault diagnosis.The DIGNN algorithm proposed in this paper had the highest accuracy in the validation set and showed great robustness in the test set.Additionally, it converged fastest among the benchmark models, as shown in Table 2.
The graph neural network extracts the correlation information of the coupled fault components.Then, the graph dataset is input into the trained model, and the fault features are achieved from the output of the first GCN layer and the output of the second GCN layer, respectively.The visualization effect is shown in Figure 11.In the output of the first GCN layer, as shown in Figure 11a, compared with Figure 8, the normal state is at the coordinate zero of the feature space.However, the correlation between the coupling fault and its fault components is unclear.The boundary between the bearing fault and gear  The graph neural network extracts the correlation information of the coupled fault components.Then, the graph dataset is input into the trained model, and the fault features are achieved from the output of the first GCN layer and the output of the second GCN layer, respectively.The visualization effect is shown in Figure 11.In the output of the first GCN layer, as shown in Figure 11a, compared with Figure 8, the normal state is at the coordinate zero of the feature space.However, the correlation between the coupling fault and its fault components is unclear.The boundary between the bearing fault and gear fault and the fault coupling mode are more clearly visualized in the second layer GCN output, as illustrated in Figure 11b.For bearings with coupling faults, 100 coupling fault samples were selected, and their output after passing through the Softmax layer of the DIGNN network is shown in Figure 12.First of all, it ensures the accuracy of the coupling fault classification, and then, the fault coupling information is mined, a benefit from the advantages of graph neural network correlation analysis, and the fault coupling mode is revealed in the classification results.For each coupling fault sample, the blue part indicates that the coupling degree of this type of fault is low.Other colors indicate that the fault type is highly coupled.The coupling fault is mainly reflected in the inner ring fault, followed by the ball and the outer ring fault being reflected, and the coupling with the gear fault is better avoided., where n F is the number of occurrences of each coupling fault, and X N is the number of samples (100 in this case).Then, except for the coupling fault 3 y , the classification accuracy rate of the other three types of coupling components is shown in Table 3.The comprehensive diagnosis accuracy is defined as , where n is the For bearings with coupling faults, 100 coupling fault samples were selected, and their output after passing through the Softmax layer of the DIGNN network is shown in Figure 12.First of all, it ensures the accuracy of the coupling fault classification, and then, the fault coupling information is mined, a benefit from the advantages of graph neural network correlation analysis, and the fault coupling mode is revealed in the classification results.For each coupling fault sample, the blue part indicates that the coupling degree of this type of fault is low.Other colors indicate that the fault type is highly coupled.The coupling fault is mainly reflected in the inner ring fault, followed by the ball and the outer ring fault being reflected, and the coupling with the gear fault is better avoided.For bearings with coupling faults, 100 coupling fault samples were selected, and their output after passing through the Softmax layer of the DIGNN network is shown in Figure 12.First of all, it ensures the accuracy of the coupling fault classification, and then, the fault coupling information is mined, a benefit from the advantages of graph neural network correlation analysis, and the fault coupling mode is revealed in the classification results.For each coupling fault sample, the blue part indicates that the coupling degree of this type of fault is low.Other colors indicate that the fault type is highly coupled.The coupling fault is mainly reflected in the inner ring fault, followed by the ball and the outer ring fault being reflected, and the coupling with the gear fault is better avoided., where n F is the number of occurrences of each coupling fault, and X N is the number of samples (100 in this case).Then, except for the coupling fault 3 y , the classification accuracy rate of the other three types of coupling components is shown in Table 3.The comprehensive diagnosis accuracy is defined as , where n is the Take the largest four items in the coupling fault classification vector Ŷ among 100 fault coupling samples, and define the coupling fault diagnosis accuracy rate as A n = F n /N X , where F n is the number of occurrences of each coupling fault, and N X is the number of samples (100 in this case).Then, except for the coupling fault y 3 , the classification accuracy rate of the other three types of coupling components is shown in Table 3.The comprehensive diagnosis accuracy is defined as A = ∑ A n /n, where n is the number of coupling fault components, and the comprehensive diagnosis accuracy of coupling fault in the dataset proposed in this paper was calculated at 88.3%, which can provide a reference for the analysis of coupling fault components.

Discussion and Conclusions
This paper presented a coupling fault diagnosis method based on a graph neural network.Through wavelet transform, a one-dimensional vibration signal was transformed into a time-frequency domain two-dimensional signal, which avoids the uncertainty caused by the data driven neural network.By establishing the fault coupling topology diagram, the coupling fault is diagnosed by using the graph neural network.The method proposed in this paper can achieve 100% accuracy of fault diagnosis under a single fault, and supports the mining of fault coupling information.The method proposed in this paper provides an interpretable data-driven method for fault coupling analysis, but the fault coupling dataset was limited, and more experiments are needed to verify the robustness of the proposed method.

.
Through k layers of the GCN network, the information of vertices within k hop −

Figure 1 .
Figure 1.Flowchart of a traditional GCN algorithm.

Figure 1 .
Figure 1.Flowchart of a traditional GCN algorithm.

Ny
∈ and used to calculate the cross-entropy with the model output ˆN y ∈ :
X 5 ∈ R N× f 8 .Finally, fault classification Ŷ is output through the Softmax layer.(f) In the process of model training, the output value ŷN of the last layer of node x N is used as the training label to optimize the model parameters.After the model is established, node x N will play the same role with other nodes in the model operation, which only carry out independent output in the model output phase.
focuses on the fault classification of unknown faults, we set 1

Figure 3 .
Figure 3. Flowchart of the DIGNN algorithm with dynamic vertex.Each color represents a fault type.

Figure 3 .
Figure 3. Flowchart of the DIGNN algorithm with dynamic vertex.Each color represents a fault type.
Mix (e) Broken tooth (f) Missing tooth

Figure 5 .
Figure 5. Topology of the coupling faults.

Figure 5 .
Figure 5. Topology of the coupling faults.
, the signals in the time-frequency domain of the bearings and gears were mainly concentrated within 8000 Hz, so the time-frequency domain signals in the frequency band from 0 to 8000 Hz were selected in this paper, and the time-frequency domain signals of various faults are shown in Figure 6.
Mix (e) Broken tooth (f) Missing tooth

Figure 6 .
Figure 6.Time-frequency feature extraction of the fault data.

.
Fault data were randomly selected in the dataset and filled with flexible nodes to form a training set, corresponding to the orange node in the figure, 4x is the coupled fault data, and the other nodes are independent fault types.The data topology of the graph consisted of 34 edges , i j e ∈ , and the dynamic vertex was assumed to be related to all vertices because of its type and the unknown correlation with other vertices (i.e.,

Figure 7 .
Figure 7. Topology and relationship of the coupling fault data with self-loops.

Figure 6 .
Figure 6.Time-frequency feature extraction of the fault data.

Figure 6 .
Figure 6.Time-frequency feature extraction of the fault data.

.
Fault data were randomly selected in the dataset and filled with flexible nodes to form a training set, corresponding to the orange node in the figure, 4x is the coupled fault data, and the other nodes are independent fault types.The data topology of the graph consisted of 34 edges , i j e ∈ , and the dynamic vertex was assumed to be related to all vertices because of its type and the unknown correlation with other vertices (i.e.,

Figure 7 .
Figure 7. Topology and relationship of the coupling fault data with self-loops.

Figure 7 .
Figure 7. Topology and relationship of the coupling fault data with self-loops.

Figure 8 .
Figure 8.Fixed vertex feature in feature space.

Figure 8 .
Figure 8.Fixed vertex feature in feature space.

Figure 8 .
Figure 8.Fixed vertex feature in feature space.

Figure 11 .
Figure 11.Visualization of the coupling fault characteristics.(a) Feature space after the first GCN layer; (b) feature space after the second GCN layer.

Figure 12 .
Figure 12.Main components of coupling faults based on the dynamic vertex output vector.Faults that may not be coupling factors are indicated in blue, and the three most possible coupling faults are highlighted in other colors.Take the largest four items in the coupling fault classification vector Ŷ among 100 fault coupling samples, and define the coupling fault diagnosis accuracy rate as / n n X A F N =

Figure 11 .
Figure 11.Visualization of the coupling fault characteristics.(a) Feature space after the first GCN layer; (b) feature space after the second GCN layer.

Figure 11 .
Figure 11.Visualization of the coupling fault characteristics.(a) Feature space after the first GCN layer; (b) feature space after the second GCN layer.

Figure 12 .
Figure 12.Main components of coupling faults based on the dynamic vertex output vector.Faults that may not be coupling factors are indicated in blue, and the three most possible coupling faults are highlighted in other colors.Take the largest four items in the coupling fault classification vector Ŷ among 100 fault coupling samples, and define the coupling fault diagnosis accuracy rate as / n n X A F N =

Figure 12 .
Figure 12.Main components of coupling faults based on the dynamic vertex output vector.Faults that may not be coupling factors are indicated in blue, and the three most possible coupling faults are highlighted in other colors.
Zhang et al. proposed a graph neural network (GNN) method based on the Granger causality test, which decomposed vibration signals into noise signals and fault signals.The graph data were established for nodes based on fault types to classify faults

Table 2 .
Comparison of the coupling fault diagnosis performance.

Table 2 .
Comparison of the coupling fault diagnosis performance.

Table 3 .
Accuracy of the diagnosis of main components constituting coupling fault.